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1 Introduction 

Many signal categories in vision and auditory problems are invariant to the action of transformation 
groups, such as translations, rotations or frequency transpositions. This property motivates the study 
of signal representations which are also invariant to the action of these transformation groups. For 
instance, translation invariance can be achieved with a registration or with auto-correlation measures. 

Transformation groups are in fact low-dimensional manifolds, and therefore mere group invariance 
is in general not enough to efficiently describe signal classes. Indeed, signals may be perturbed with 
additive noise and also with geometrical deformations, so one can then ask for invariant representa- 
tions which are stable to these perturbations. Scattering convolutional networks [ 1 1 construct locally 
translation invariant signal representations, with additive and geometrical stability, by cascading 
complex wavelet modulus operators with a lowpass smoothing kernel. By defining wavelet decom- 
positions on any locally compact Lie Group, scattering operators can be generalized and cascaded to 
provide local invariance with respect to more general transformation groups fl2]|3|. Although such 
transformation groups are present across many recognition problems, they require prior information 
which sometimes cannot be assumed. 

Convolutional networks [4| cascade filter banks with point-wise nonlinearities and local pooling 
operators. By remapping the output of each layer with the input of the following one, the trainable 
filters implement convolution operators. We show that the invariance properties built by deep convo- 
lutional networks can be cast as a form of stable group invariance. The network wiring architecture 
determines the invariance group, while the trainable filter coefficients characterize the group action. 

Deep convolutional architectures cascade several layers of convolutions, non-linearities and pooling. 
These architectures have the capacity to generate local invariance to the action of more general 
groups. Under appropriate conditions, these groups can be factorized as products of smaller groups. 
Each of these factors can then be associated with a subset of consecutive layers of the convolutional 
network. In these conditions, the invariance properties of the final representation can be studied 
from the group structure generated by each layer. 



2 Problem statement 
2.1 Stable Group Invariance 

A transformation group G acts on the input space X (assumed to be a Hilbert space) with a linear 
group action (g, x) i-> g.x £ X, which is compatible with the group operation. 

A signal representation $ : X — > Z is invariant to the action of G if V<? £ G , x € X , &(g.x) = 
Q(x) . However, mere group invariance is in general too weak, due to the presence of a much 
larger, high dimensional variability which does not belong to the low-dimensional group. It is then 
necessary to incorporate the notion of outer "deformations" with another group action ip : H x 
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X — > X , where H is a larger group containing G. The geometric stability can be stated with a 
Lipschitz continuity property 

\\*(<p(h,x))-$(x)\\ <C\\x\\k(h,G), (1) 

where k(h, G) measures the "distance" from h to the invariance group G. For instance, when G is 
the translation group of R d and H D G is the group of C 2 diffeomorphisms of R d , then (p(h, x) = 
x oh and one can select as distance the elastic deformation metric k(h, G) := IVt^ + \Ht\oo, 
where t{u) — h(u) — u Q. 

Even though the group invariance formalism describes global invariance properties of the represen- 
tation, it also provides a valid and useful framework to study local invariance properties. Indeed, if 
one replaces ([]} by 

||*fe>(M))-*(aOII < C\\x\\{\\h G \\ G + k(h,G)), (2) 

where he is a projection of h to G and \\g\\G is a metric on G measuring the amount of transforma- 
tion being applied, then the local invariance is expressed by adjusting the proportionality between 
the two metrics. 

2.2 Convolutional Networks 

A generic convolutional network defined on a space X = L 2 (£lo) of square-integrable signals starts 
with a filter bank {ipx}xeAi> Va £ L 1 (CIq)V\, which for each input x(u) G X produces the 
collection 

z^\u, A) = x* ?p\(u) = J x(u — v)tp\{v)dv , w G f2o , A G Ai . 
If the filter bank defines a stable, invertible frame, then there exist two constants a, A > such that 

Vx,a\\x\\ < ||z (1) || < A\\x\\ , 

where Hz^H 2 = XaeAi W^ 1 ("> ^)ll 2 - defining fli = Oo x Ai, the first layer of the network 
can be written as the linear mapping 

F 1 : L 2 (n ) — > L 2 (n x ) 

x(u) i— >• z (1) (u,A). 

is then transformed with a point-wise nonlinear operator M : L 2 (fi) — > L 2 (f2) which is usually 
non-expansive, meaning that ||M^|| < ||z||. Finally, a local pooling operator P can be defined as 
any linear or nonlinear operator 

P : L 2 (Q) — > L 2 {Q) 

which reduces the resolution of the signal along one or more coordinates and which avoids "alias- 
ing". If r2 = r^o x Ai, , x Afc and (2 J °, , 2 Jh ) denote the loss of resolution along each coordinate, it 

results that Q = O X Ai, , XA&, with |fl | = 2~ Qj °|f2 |, |Aj| = 2~ aJi |A i |, where a is anoversam- 
pling factor. Linear pooling operators are implemented as lowpass filters cf>j(u, Ai, , A m ) followed 
by a downsampling. 

Then, a /c-layer convolutional network is a cascade 

L 2 (Q ) A ^ L 2 (p,{) A L 2 (^) ^ L 2 (n 2 ) ... A L 2 {n k ) , (3) 
which produces successively z^ 2 \ . . . , z( k \ 

The filter banks (i*i)i<fc, together with the pooling operators (Pj)j<fc, progressively transform the 
signal domain; filter bank steps lift the domain of definition by adding new coordinates, whereas 
pooling steps reduce the resolution along certain coordinates. 

3 Invariance Properties of Convolutional Networks 
3.1 The case of one-parameter transformation groups 

Let us start by assuming the simplest form of variability produced by a transformation group. A 
one-parameter transformation group is a family {Ut]teM of unitary linear operators of L 2 (Jl) such 
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that (i) t n> JJt it is strongly continuous: lim t ^ fo U t z — Ut z for every z e L 2 (f2), and (ii) 
Ut+s — UtJJ s . One parameter transformation groups are thus homeomorphic to R (with the addition 
as group operation), and define an action which is continuous in the group variable. Uni-dimensional 
translations Utx(u) = x(u — tvo), frequency transpositions Utx — J r_1 (7 r a;(w — two)) (where T, 
T~ x are respectively the forward and inverse Fourier transform) or unitary dilations Utx(u) = 
2~ t l 2 x(2~ t u) are examples of one-parameter transformation groups. 

One-parameter transformation groups are particularly simple to study thanks to Stone's theorem (5), 
which states that unitary one-parameter transformation groups are uniquely generated by a complex 
exponential of a self-adjoint operator: 

U t = e ltA , tel. 

Here, the complex exponential of a self-adjoint operator should be interpreted in terms of its spec- 
tra. In the finite dimensional case (when ft is discrete), this means that there exists an orthogonal 
transform O such that if z(u) = Oz, then 

Vz , U t z = 0- 1 diag(e 4 *".i(w)) . (4) 

In other words, the group action can be expressed as a linear phase change in the basis which 
diagonalizes the unique self-adjoint operator A given by Stone's theorem. In the particular case of 
translations, the change of basis O is given by the Fourier transform. As a result, one can obtain 
a representation which is invariant to the action of {Ut}t with a single layer of a neural network: 
a linear decomposition which expresses the data in the basis given by O followed by a point-wise 
complex modulus. In the case of the translation group, this corresponds to taking the modulus of the 
Fourier transform. 

3.2 Presence of deformations 

Stone's theorem provides a recipe for global group invariance for strongly continuous group actions. 
Without noise nor deformations, an invariant representation can be obtained by taking complex 
moduli in a basis which diagonalizes the group action, which can be implemented in a shallow 1- 
layer architecture. However, the underlying low-dimensional assumption is rarely satisfied, due to 
the presence of more complex forms of variability. 

This complex variability can be modeled as follows. If O is the basis which diagonalizes a given one- 
parameter group, then the group action is expressed in the basis F~ 1 as the translation operator 
T a z(u) = z(u — s). Whereas the group action consists in rigid translations on this basis, by analogy 
a deformation is defined as a non-rigid warping in this domain: L T z(u) = z(u — t(u)), where r is 
a displacement field along the indexes of the decomposition. 

The amount of deformation can be measured with the regularity of t(u), which controls how dis- 
tant the warping is from being a rigid translation and hence an element of the group. This suggests 
that, in order to obtain stability to deformations, rather than looking for eigenvectors of the in- 
finitesimal group action, one should look for linear measurements which are well localized in the 
domain where deformations occur, and which nearly diagonalize the group action. In particular, 
these measurements can be implemented with convolutions using compactly supported filters, such 
as in convolutional networks. 

Let z'"'(ti, Ai, . . . , A„) be an intermediate representation in a convolutional network, and whose 
first layer is fully connected. Suppose that G is a group acting on z via 

g.z^(u, Ax, . . . , A n ) = Ai + V (g), A 2 , . . . , A„) , (5) 

where rj : G — >• Ai. This corresponds to the idealized case where the transformation only modifies 
one component of the representation. A local pooling operator along the variable Ai, at a certain 
scale 2 J , attenuates the transformation by g as soon as \rj(g)\ <C 2 J . It thus produces local invariance 
with respect to the action of G. 

3.3 Group Factorization with Deep Networks 

Deep convolutional networks have the capacity to learn complex relationships of the data and to 
build invariance with respect to a large family of transformations. These properties can be partly 
explained in terms of a factorization of the invariance groups performed successively. 
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Whereas pooling operators efficiently produce stable local invariance, convolution operators pre- 
serve the invariance generated by previous layers. Indeed, suppose z^ n \u, Ai) is an interme- 
diate representation in a convolutional network, and that G acts on z^™' via g.z^ n \u,Xi) = 
z^ n \f{g, u), Ai + r](g)). It follows that if the next layer is constructed as 

z("+ 1 )( U ,A 1 ,A 2 ) :=z< n >(u,.)*lMAi), 

then G acts on z'" +1 ' via g.z^ l+1 \u, Ai, A2) = z^ n \f(g,u),Xi + r](g), A2), since convolutions 
commute with the group action, which by construction is expressed as a translation in the coefficients 
Ai. The new coordinates A2 are thus unaffected by the action of G. 

As a consequence, this property enables a systematic procedure to generate invariance to groups 
of the form G — G\ x G2 x ■ • ■ x G s , where H\ x H% is the semidirect product of groups. In 
this decomposition, each factor Gi is associated with a range of convolutional layers, along the 
coordinates where the action of G t is perceived. 

4 Perspectives 

The connections between group invariance and deep convolutional networks offer an interpretation 
of their efficiency on several recognition tasks. In particular, they might explain why the weight 
sharing induced by convolutions is a valid regularization method in presence of group variability. 

More concretely, we shall also concentrate on the following aspects: 

• Group Discovery. One might ask for the group of transformations which best explains the 
variability observed in a given dataset {xi}. In the case where no geometric deformations 
are present, one can start by learning the (complex) eigenvectors of the group action: 

U* = arg min var( \UxA) . 
u T u=i 

When the data corresponds to a uniform measure on the group, then this decomposition 
can be obtained from the diagonalization of the covariance operator S = E(x T x). In that 
case, the real eigenvectors of S are grouped into pairs of vectors with identical eigenvalue, 
which then define the complex decomposition diagonalizing the group action. 
In presence of deformations, the global invariance is replaced by a measure of local invari- 
ance. This problem is closely related to the sparse coding with slowness from Jfj). 

• Structured Convolutional Networks. Groups offer a powerful framework to incorporate 
structure into the families of filters, similarly is in |]7]. On the one hand, one can enforce 
global properties of the group by defining the convolutions accordingly. For instance, by 
wrapping the domain of the convolution, one is enforcing a periodic group to emerge. On 
the other hand, one could further regularize the learning by enforcing a group structure 
within a filter bank. For instance, one could ask a certain filter bank F = {hi, . . . , h n } to 
have the form F = {Rgho}g, where Rg is a rotation with an angle 8. 
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